PUMA: Purdue MapReduce Benchmarks Suite
نویسندگان
چکیده
منابع مشابه
MRBS: A Comprehensive MapReduce Benchmark Suite
MapReduce is a promising programming model for distributed data processing. Extensive research has been conducted on the scalability of MapReduce, and several systems have been proposed in the literature, ranging from job scheduling to data placement and replication. However, realistic benchmarks are still missing to analyze and compare the effectiveness of these proposals. To date, most MapRed...
متن کاملHiBench: A Representative and Comprehensive Hadoop Benchmark Suite
MapReduce and its popular open source implementation, Hadoop, are moving toward ubiquitous for Big Data storage and processing. Therefore, it is essential to quantitatively evaluate and characterize the Hadoop deployment through extensive benchmarking. In this paper, we present HiBench [1], a representative and comprehensive benchmark suite for Hadoop, which consists of a set of Hadoop programs...
متن کاملLNCS 7640 - Euro-Par 2012: Parallel Processing Workshops
MapReduce is a popular programming model for distributeddata processing. Extensive research has been conducted on the reliability of MapReduce, ranging from adaptive and on-demand fault-tolerance tonew fault-tolerance models. However, realistic benchmarks are still miss-ing to analyze and compare the effectiveness of these proposals. To date, most MapReduce fault-tolerance solutions...
متن کاملImproving the Load Balance of MapReduce Operations based on the Key Distribution of Pairs
Load balance is important for MapReduce to reduce job duration, increase parallel efficiency, etc. Previous work focuses on coarse-grained scheduling. This study concerns finegrained scheduling on MapReduce operations. Each operation represents one invocation of the Map or Reduce function. Scheduling MapReduce operations is difficult due to highly skewed operation loads, no support to collect w...
متن کاملUser-Centric Heterogeneity-Aware MapReduce Job Provisioning in the Public Cloud
Cloud datacenters are becoming increasingly heterogeneous with respect to the hardware on which virtual machine (VM) instances are hosted. As a result, ostensibly identical instances in the cloud show significant performance variability depending on the physical machines that host them. In our case study on Amazon’s EC2 public cloud, we observe that the average execution time of Hadoop MapReduc...
متن کامل